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Resume. Nous considerons le probleme de I’optimisation globule d’une fonction / a 
partir d’evaluations tres bruitees. Nous adoptons un point de vue bayesien sequentiel : 
les points d’evaluation sont choisis de maniere a reduire I’incertitude sur la position de 
I’optimum global de /, cette incertitude Aant mesuree par I’entropie de la variable alea- 
toire correspondante (Informational Approach to Global Optimization, Villemonteix et 
al., 2009). Lorsque les evaluations sont tres bruit As, I’erreur d’estimation de I’entropie 
par simulation conditionnelle devient non negligeable par rapport a ses variations sur 
son domaine de definition. Nous proposons une solution a ce probleme en choisissant les 
points d’evaluation comme si plusieurs valuations allaient etre faites en ces points. Une 
application a I’optimisation d’une strategie d’insertion des energies renouvelables dans un 
reseau de distribution d’electricite illustre la methode proposee. 

Mots-cles. Processus gaussiens; Planification et analyse d’expAiences numeriques; 
Optimisation bayAienne; Energies renouvelables; Reseau de distribution electrique. 

Abstract. We consider the problem of global optimization of a function / from 
very noisy evaluations. We adopt a Bayesian sequential approach: evaluation points are 
chosen so as to reduce the uncertainty about the position of the global optimum of /, as 
measured by the entropy of the corresponding random variable (Informational Approach 
to Global Optimization, Villemonteix et ah, 2009). When evaluations are very noisy, the 
error coming from the estimation of the entropy using conditional simulations becomes 
non negligible compared to its variations on the input domain. We propose a solution 
to this problem by choosing evaluation points as if several evaluations were going to be 
made at these points. The method is applied to the optimization of a strategy for the 
integration of renewable energies into an electrical distribution network. 

Keywords. Gaussian processes; Design and Analysis of Gomputer Experiments; 
Bayesian Optimization; Renewable Energies; Electrical Distribution Network. 
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1 Introduction 


Let / be a continuous real-valued function, defined on (or a subset of R'^), d > 
1. Given a finite set X C R'^, we consider the problem of estimating the minimum 
M = and the corresponding set of minimizers, x* G argmin,j,gx/(3^)5 using a 

sequence of evaluations of / at points Xi, X 2 ,... Xn G X. In this article, the evaluation 
results are assumed noisy: at each Xi, we observe a perturbed value of f{Xi). The 
construction of an optimization algorithm X = (Xi,X 2 ,...) is viewed as a sequential 
decision problem: given n (noisy) evaluation results at Xi, ..., X„, we must choose X^+i 
in order to get, in the end, the best estimators of x* and M according to a certain loss 
function. 

We adopt the following (classical) Bayesian approach for constructing X. The un¬ 
known function / is considered as a sample path of a Gaussian random process ^ dehned 
on some probability space Po), with parameter x G X. Then, a noisy evaluation 

of / at Xj G X is modeled by the random variable := * = Ij 2, ..., 


with £ 1 , £2 


i.i.d 

r\j 


Af{0,a'^) (here, is assumed to be known). Denote by the condi¬ 


tional distribution Pr 


^ ^ , Xn), where = |Xi, ..., X„, and by E„ and var„ 

the condi tional expectation E( ■ | X „ ) a nd conditional v arianc e var( ■ | X„) respectively. 
Following Villemonteix et ah f 2009l l and Vazquez et ah ( 2008 1. the efficiency of an algo¬ 
rithm X after n evaluations is measured using the posterior Shannon entropy 


H{x*-,In) = = x)logP„(x* = x) , 


( 1 ) 


which quantifies the residual uncertainty about the position of x*. Then, each new eval¬ 
uation point is chosen using a Stepwise Uncertainty Reduction (SUR) approach, which 
consists in minimizing a sampling criterion that corresponds to the expected residual 
uncertainty on x* after n -|- 1 evaluation results: 


Xn+i = argmin,^gx Jn{x) with J„(x) := E„ (i/(x*;X„+i) | X„+i = x) . 


( 2 ) 


Notice that Jn{,x) is an expectation with respect to the random evaluation result at 
Xn+i = X. Minimizing is equivale nt to maximizi n g the mutual information between x* 
and The reader is referred to iPichenv et al.l (120131) to a review of other sampling 

criteria for noisy optimization. 

From a numerical point of view, the computation of is based on two approximations. 
A first approximation is required for the computation of the expectation in ([2]) with 
respect to the posterior distribution of at X„+i = x. Since ^ and the evaluation 
noise are Gaussian, the expectation in ([2]) is a one-dimensional integral with respect to 
the Gaussian posterior density of which can be carried out with a standard Gauss- 
Hermite quadrature. A sec ond approximation i s need ed to compute the entropy of the 
posterior distribution of x*. IVillemonteix et ahl (120091) estimate this entropy by plugging 
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into ([I]) an estimator of Pn{x* = a:), with x ranging over X, which, in turn, is estimated 
by Monte-Carlo simulations of sample paths of ^ conditioned on X„. 

When evaluations are noise-free, it is often possible to obtain a satisfactory estimator 
of the entropy with a moderately large number of sample paths (~ 1000). However, when 
the evaluation noise becomes large, it appears that, for the same moderately large number 
of sample paths, the variance of estimation of the entropy becomes non negligible with 
respect to the information provided by a single evaluation. Then, minimizing to choose 
new evaluation points becomes questionable. In this article, we propose to circumvent 
this problem with a new sampling criterion where, in essence, we pretend that several 
evaluations are going to be carried out instead of a single one. 


2 The Informational Approach to Global Optimiza¬ 
tion with (very) noisy evaluations 


Since a single noisy evaluation provides limited information about x*, and therefore 
yields by itself little progress in the optimization procedure, the variations of on X can 
be dominated by its estimation error (as illustrated in Figure [H hrst left). 

A natural idea to gain more information from noisy evaluations is to perform several 
evaluations at each iteration of the optimization algorithm. Our contribution is as follows: 
we suggest to build a sampling criterion J'^ such that for all x G X, J'^^x) corresponds to 
the expected residual uncertainty about x* when K (noisy) evaluations of / are performed 
at x: 

J'^{x) := {H{x*-,Xn+K) \ Xn +1 = ...= X^+K = x) . (3) 


The resulting criterion is illustrated in Figured] with K equal to 10, 100 and -|-cxo. 

We refer to K as the virtual batch size, since we do not actually intend to perform K 
evaluations at the minimizer X^+i of J'^. Once X^+i has been obtained by minimizing (151) . 
any number Kq of evaluations (between Kq = 1, as assumed in Section dl and Kq = -|-cx)) 
can actually be performed at this point; this number Kq is the actual batch size. We 
suggest to take K large enough to make the error of estimation of small with respect 
to the variations of the criterion, and to carry out only one actual evaluation {Kq = 1) 
at each iteration if evaluations are very expensive, or a batch of size Kq > 1 (typically, 
Kq -C K) if evaluations are only moderately expensive or if parallel processing is available. 
Another possibility would be to update K at e ach iteration so as to consider the whole 
remaining budget of evaluations as suggested in Pichenv et al.l ( 2010 ). 

The idea of considering K evaluations at the same point in ([2D is only an artihcial 
construction, motivated by the fact that the numerical complexity of the computation of 
is the same as that of J„. Indeed, it can be shown that the distribution of ^ condi¬ 
tioned on Xn+K only depends in this case on ..., and = 

+ jc'Hk^n+k- This has two consequences. First, the expectation in (jlD is simply 
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Figure 1: Realizations of the numerical estimate of the sampling criterion for the data shown in 

Figured] (right). Each figure represents 15 independent realizations (corresponding to independent 
samples of conditional simulations). The batch size is, from left to right: iF = 1,10,100 and +oo. A 
standard 15-order Gauss-Hermite is used for the integration and 1000 conditional samplepaths. 


a one-dimensional integral with respect to the (conditional) distribution of ^n+i, which 
is Gaussian, with mean equal to E„(^(x)) and variance equal to var„(^(x)) -|- Sec¬ 

ond, the simulation of sample paths of ^ conditioned on the n + K random variables 
..., boils down to the simulation of sample paths of ^ conditioned on the n-|- 1 


random variables 


obs 


Gobs G 
Sn 5 Sn+l* 


The optimization algorithm with the new criter i on Jj is available for testing in a 


development branch of the STK toolbox fiBect et ahl. 12014) . 


3 Application 


The method is applied to the optimization of a strategy for the integration of Renew¬ 
able Energy Sources (RES) into an electrical distribution network. This strategy describes 
how the Distribution System Operator (DSO) connects ne w producers to t he netw ork un¬ 
der strict economic, safety and regulatory requirements ( Dutrieux et ah . 2015a /9). Our 
objective is to hnd the optimal value of one parameter of the strategy, x G [—1; 0], so as to 
minimize the mean global cost of integrating about 20 megawatts of RES over 10 years. 

The objective function is /(x) = E 5 (G (x. S')), where S denotes a 10 -year scenario 
(consisting of several time series, together with the characteristics of RES connection 
requests), E 5 the expectation with respect to a random scenario, and C{x,S) the cost 
of the strategy with parameter x applied to the scenario S. The computation of C is 
performed by an expensive-to-evaluate computer program. We assume evaluations of 
the form = C (Xj,Sj), where Si,S 2 ,... are independent scenarios generated by the 
same scenario generator (and therefore identically distributed). This can be rewritten 
as = f{Xi) + Si, where the variables Si = C {Xi,Si) — f {Xi) are independent and 
have zero mean. As shown in Figure [2| (left), the evaluation results are very noisy in 
this application. For the sake of simplicity, the noise variance is assumed to be a known 
constant (estimated based on a few result evaluations) and the variables z = 1 , 2 ,... 
will be assumed Gaussian. 
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Figure 2: Left: Reference data. The search grid (on the x-axis) has m = 51 points. On each point, 
approx. 1000 evaluation results are available. The solid black line represents the empirical mean. Right: 
initial sample of 110 evaluations (11 batches of 10 evaluations). The dashed gray line represents the 
kriging mean. The grayed region represents pointwise credibility intervals with probability 95%. 


We consider a budget of 2000 evaluations (without the initial sample) to find the 
minimize! among 51 candidate points linearly spaced in [—1,0]. A batch of Kq = 10 
evaluations is performed at each iteration. We compare three ways of constructing X'. 
using the sampling criterion J'^ when K = Kq = 10 (denoted as lAGO 10); using with 
K = +CX) (denoted as lAGO +oo); and, as reference, choosing Xn+i at random, uniformly 
in the set of candidate points (denoted as IID). The kriging model parameters are firstly 
estimated on an initial sample of 110 evaluations (11 batches of 10 evaluations as shown 
in Figure [21 right), then adjusted after each new batch of evaluations. 

Figure 13] depicts the distribution of the estimated minimizer, the estimated minimum 
and the posterior entropy of the minimizer over the 500 optimization runs. lAGO +cx) 
converges towards the area of interest faster than IID and lAGO 10. It is worth noting 
that a budget of 2000 evaluations does not suffice to locate the minimizer accurately. In 
fact, even 1000 evaluations at each candidate point (as in Figure [21 left), would not locate 
it much more precisely (result not shown). 

4 Conclusion 

We have proposed a new sampling criterion for the problem of global optimization in 
presence of very noisy evaluations, assuming that several evaluations are going to be made 
at a new evaluation point (even if they are not in practice). The proposed method has 
been applied to the optimization of a renewable energy integration strategy and shown 
to outperform plain IID sampling and the original lAGO criterion. 
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Figure 3: Distribution of the estimated minimizer x* (left), the estimated minimum (center) and 
the posterior entropy Iln of the minimizer (right) over 500 optimization runs. On each box, the central 
mark is the median, the edges of the box are the 25th and 75th percentiles, and the whiskers are the 
5th and 95th percentiles. The thick black lines indicate the value obtained on our reference dataset. 
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